17 research outputs found

    Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting

    Full text link
    Keyword spotting (KWS) constitutes a major component of human-technology interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate, while minimizing the footprint size, latency and complexity are the goals for KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks (CRNNs). Inspired by large-scale state-of-the-art speech recognition systems, we combine the strengths of convolutional layers and recurrent layers to exploit local structure and long-range context. We analyze the effect of architecture parameters, and propose training strategies to improve performance. With only ~230k parameters, our CRNN model yields acceptably low latency, and achieves 97.71% accuracy at 0.5 FA/hour for 5 dB signal-to-noise ratio.Comment: Accepted to Interspeech 201

    SlimPajama-DC: Understanding Data Combinations for LLM Training

    Full text link
    This paper aims to understand the impacts of various data combinations (e.g., web text, wikipedia, github, books) on the training of large language models using SlimPajama. SlimPajama is a rigorously deduplicated, multi-source dataset, which has been refined and further deduplicated to 627B tokens from the extensive 1.2T tokens RedPajama dataset contributed by Together. We've termed our research as SlimPajama-DC, an empirical analysis designed to uncover fundamental characteristics and best practices associated with employing SlimPajama in the training of large language models. During our research with SlimPajama, two pivotal observations emerged: (1) Global deduplication vs. local deduplication. We analyze and discuss how global (across different sources of datasets) and local (within the single source of dataset) deduplications affect the performance of trained models. (2) Proportions of high-quality/highly-deduplicated multi-source datasets in the combination. To study this, we construct six configurations of SlimPajama dataset and train individual ones using 1.3B Cerebras-GPT model with Alibi and SwiGLU. Our best configuration outperforms the 1.3B model trained on RedPajama using the same number of training tokens by a significant margin. All our 1.3B models are trained on Cerebras 16×\times CS-2 cluster with a total of 80 PFLOP/s in bf16 mixed precision. We further extend our discoveries (such as increasing data diversity is crucial after global deduplication) on a 7B model with large batch-size training. Our models and the separate SlimPajama-DC datasets are available at: https://huggingface.co/MBZUAI-LLM and https://huggingface.co/datasets/cerebras/SlimPajama-627B.Comment: Technical report. Huggingface: https://huggingface.co/MBZUAI-LLM and https://huggingface.co/datasets/cerebras/SlimPajama-627

    Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

    Full text link
    We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs). The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts, including source code in various programming languages. With 13 billion parameters, they demonstrate better knowledge and reasoning capabilities in Arabic than any existing open Arabic and multilingual models by a sizable margin, based on extensive evaluation. Moreover, the models are competitive in English compared to English-centric open models of similar size, despite being trained on much less English data. We provide a detailed description of the training, the tuning, the safety alignment, and the evaluation of the models. We release two open versions of the model -- the foundation Jais model, and an instruction-tuned Jais-chat variant -- with the aim of promoting research on Arabic LLMs. Available at https://huggingface.co/inception-mbzuai/jais-13b-chatComment: Arabic-centric, foundation model, large-language model, LLM, generative model, instruction-tuned, Jais, Jais-cha

    Reducing the SPEC2006 Benchmark Suite for SimulationBased Abstract Computer Architecture Research

    No full text
    Present day   computer   architects   use   advanced microarchitecture simulators   to   test   the   performance   of processor designs. The simulator workloads are generally benchmarks,  which are representative of specific types of real world   applications. Because   microarchitecture implementations increase in complexity and  the simulation workloads are   required   to   represent   complicated applications,  the simulation time has greatly  increased. To solve the problem,  researchers are looking into ways to reduce the   amount   of   time   benchmarks   run,   while maintaining the   same   workload   characterization   of   the longer benchmarks. MinneSPEC   is   a   representative reduction of SPEC2000,  with the reduced input sets found using SimpleScalar profiling tools [1]. With the release of SPEC CPU2006,  new benchmarks have been added to the SPEC benchmarking suite which will be used to evaluate performance in   tomorrow's   microprocessors. These benchmarks are considerably larger than SPEC2000 and using SimpleScalar to profile their workloads would take a large amount of time and effort. This paper suggests a different reduction   technique   which   gathers   profiling information using   processor   performance   counters accessed using PAPI. Since workloads are running on a native system instead of a simulator,  profiling information can be gathered in a much shorter amount of time. This allows for fine grained tuning of reduced input sets so more representative reduced benchmarks can be found in a much shorter amount of time. Using this technique,  we were able to reduce five SPEC2006 benchmarks to under 1
    corecore